67 research outputs found

    The Genome Russia Project: Closing the Largest Remaining Omission on the World Genome Map

    Get PDF
    We are witnessing the great era of genome exploration of the world, as genetic variation in people is being detailed across multiple varied world populations in an effort unprecedented since the first human genome sequence appeared in 2001. However, these efforts have yet to produce a comprehensive mapping of humankind, because important regions of modern human civilization remain unexplored. The Genome Russia Project promises to fill one of the largest gaps, the expansive regions across the Russian Federation, informing not just medical genomics of the territories, but also the migration settlements of historic and pre-historic Eurasian peoples

    SmileFinder: A Resampling-Based Approach to Evaluate Signatures of Selection from Genome-Wide Sets of Matching Allele Frequency Data in Two or More Diploid Populations

    Get PDF
    Background: Adaptive alleles may rise in frequency as a consequence of positive selection, creating a pattern of decreased variation in the neighboring loci, known as a selective sweep. When the region containing this pattern is compared to another population with no history of selection, a rise in variance of allele frequencies between populations is observed. One challenge presented by large genome-wide datasets is the ability to differentiate between patterns that are remnants of natural selection from those expected to arise at random and/or as a consequence of selectively neutral demographic forces acting in the population. Findings: SmileFinder is a simple program that looks for diversity and divergence patterns consistent with selection sweeps by evaluating allele frequencies in windows, including neighboring loci from two or more populations of a diploid species against the genome-wide neutral expectation. The program calculates the mean of heterozygosity and FST in a set of sliding windows of incrementally increasing sizes, and then builds a resampled distribution (the baseline) of random multi-locus sets matched to the sizes of sliding windows, using an unrestricted sampling. Percentiles of the values in the sliding windows are derived from the superimposed resampled distribution. The resampling can easily be scaled from 1 K to 100 M; the higher the number, the more precise the percentiles ascribed to the extreme observed values. Conclusions: The output from SmileFinder can be used to plot percentile values to look for population diversity and divergence patterns that may suggest past actions of positive selection along chromosome maps, and to compare lists of suspected candidate genes under random gene sets to test for the overrepresentation of these patterns among gene categories. Both applications of the algorithm have already been used in published studies. Here we present a publicly available, open source program that will serve as a useful tool for preliminary scans of selection using worldwide databases of human genetic variation, as well as population datasets for many non-human species, from which such data is rapidly emerging with the advent of new genotyping and sequencing technologies

    Genome-wide scans for footprints of natural selection

    Get PDF
    Detecting recent selected ‘genomic footprints’ applies directly to the discovery of disease genes and in the imputation of the formative events that molded modern population genetic structure. The imprints of historic selection/adaptation episodes left in human and animal genomes allow one to interpret modern and ancestral gene origins and modifications. Current approaches to reveal selected regions applied in genome-wide selection scans (GWSSs) fall into eight principal categories: (I) phylogenetic footprinting, (II) detecting increased rates of functional mutations, (III) evaluating divergence versus polymorphism, (IV) detecting extended segments of linkage disequilibrium, (V) evaluating local reduction in genetic variation, (VI) detecting changes in the shape of the frequency distribution (spectrum) of genetic variation, (VII) assessing differentiating between populations (FST), and (VIII) detecting excess or decrease in admixture contribution from one population. Here, we review and compare these approaches using available human genome-wide datasets to provide independent verification (or not) of regions found by different methods and using different populations. The lessons learned from GWSSs will be applied to identify genome signatures of historic selective pressures on genes and gene regions in other species with emerging genome sequences. This would offer considerable potential for genome annotation in functional, developmental and evolutionary contexts

    Genomic Legacy of the African Cheetah, Acinonyx jubatus

    Get PDF
    Background Patterns of genetic and genomic variance are informative in inferring population history for human, model species and endangered populations. Results Here the genome sequence of wild-born African cheetahs reveals extreme genomic depletion in SNV incidence, SNV density, SNVs of coding genes, MHC class I and II genes, and mitochondrial DNA SNVs. Cheetah genomes are on average 95 % homozygous compared to the genomes of the outbred domestic cat (24.08 % homozygous), Virunga Mountain Gorilla (78.12 %), inbred Abyssinian cat (62.63 %), Tasmanian devil, domestic dog and other mammalian species. Demographic estimators impute two ancestral population bottlenecks: one \u3e100,000 years ago coincident with cheetah migrations out of the Americas and into Eurasia and Africa, and a second 11,084–12,589 years ago in Africa coincident with late Pleistocene large mammal extinctions. MHC class I gene loss and dramatic reduction in functional diversity of MHC genes would explain why cheetahs ablate skin graft rejection among unrelated individuals. Significant excess of non-synonymous mutations in AKAP4 (p\u3c0.02), a gene mediating spermatozoon development, indicates cheetah fixation of five function-damaging amino acid variants distinct from AKAP4 homologues of other Felidae or mammals; AKAP4 dysfunction may cause the cheetah’s extremely high (\u3e80 %) pleiomorphic sperm. Conclusions The study provides an unprecedented genomic perspective for the rare cheetah, with potential relevance to the species’ natural history, physiological adaptations and unique reproductive disposition

    History Shaped the Geographic Distribution of Genomic Admixture on the Island of Puerto Rico

    Get PDF
    Contemporary genetic variation among Latin Americans human groups reflects population migrations shaped by complex historical, social and economic factors. Consequently, admixture patterns may vary by geographic regions ranging from countries to neighborhoods. We examined the geographic variation of admixture across the island of Puerto Rico and the degree to which it could be explained by historic and social events. We analyzed a census-based sample of 642 Puerto Rican individuals that were genotyped for 93 ancestry informative markers (AIMs) to estimate African, European and Native American ancestry. Socioeconomic status (SES) data and geographic location were obtained for each individual. There was significant geographic variation of ancestry across the island. In particular, African ancestry demonstrated a decreasing East to West gradient that was partially explained by historical factors linked to the colonial sugar plantation system. SES also demonstrated a parallel decreasing cline from East to West. However, at a local level, SES and African ancestry were negatively correlated. European ancestry was strongly negatively correlated with African ancestry and therefore showed patterns complementary to African ancestry. By contrast, Native American ancestry showed little variation across the island and across individuals and appears to have played little social role historically. The observed geographic distributions of SES and genetic variation relate to historical social events and mating patterns, and have substantial implications for the design of studies in the recently admixed Puerto Rican population. More generally, our results demonstrate the importance of incorporating social and geographic data with genetics when studying contemporary admixed populations

    Identifying Selected Regions from Heterozygosity and Divergence Using a Light-Coverage Genomic Dataset from Two Human Populations

    Get PDF
    When a selective sweep occurs in the chromosomal region around a target gene in two populations that have recently separated, it produces three dramatic genomic consequences: 1) decreased multi-locus heterozygosity in the region; 2) elevated or diminished genetic divergence (FST) of multiple polymorphic variants adjacent to the selected locus between the divergent populations, due to the alternative fixation of alleles; and 3) a consequent regional increase in the variance of FST (S2FST) for the same clustered variants, due to the increased alternative fixation of alleles in the loci surrounding the selection target. In the first part of our study, to search for potential targets of directional selection, we developed and validated a resampling-based computational approach; we then scanned an array of 31 different-sized moving windows of SNP variants (5–65 SNPs) across the human genome in a set of European and African American population samples with 183,997 SNP loci after correcting for the recombination rate variation. The analysis revealed 180 regions of recent selection with very strong evidence in either population or both. In the second part of our study, we compared the newly discovered putative regions to those sites previously postulated in the literature, using methods based on inspecting patterns of linkage disequilibrium, population divergence and other methodologies. The newly found regions were cross-validated with those found in nine other studies that have searched for selection signals. Our study was replicated especially well in those regions confirmed by three or more studies. These validated regions were independently verified, using a combination of different methods and different databases in other studies, and should include fewer false positives. The main strength of our analysis method compared to others is that it does not require dense genotyping and therefore can be used with data from population-based genome SNP scans from smaller studies of humans or other species

    Whole genome scan reveals the genetic signature of African Ankole cattle breed and potential for higher quality beef

    Get PDF
    BACKGROUND: Africa is home to numerous cattle breeds whose diversity has been shaped by subtle combinations of human and natural selection. African Sanga cattle are an intermediate type of cattle resulting from interbreeding between Bos taurus and Bos indicus subspecies. Recently, research has asserted the potential of Sanga breeds for commercial beef production with better meat quality as compared to Bos indicus breeds. Here, we identified meat quality related gene regions that are positively selected in Ankole (Sanga) cattle breeds as compared to indicus (Boran, Ogaden, and Kenana) breeds using cross-population (XP-EHH and XP-CLR) statistical methods. RESULTS: We identified 238 (XP-EHH) and 213 (XP-CLR) positively selected genes, of which 97 were detected from both statistics. Among the genes obtained, we primarily reported those involved in different biological process and pathways associated with meat quality traits. Genes (CAPZB, COL9A2, PDGFRA, MAP3K5, ZNF410, and PKM2) involved in muscle structure and metabolism affect meat tenderness. Genes (PLA2G2A, PARK2, ZNF410, MAP2K3, PLCD3, PLCD1, and ROCK1) related to intramuscular fat (IMF) are involved in adipose metabolism and adipogenesis. MB and SLC48A1 affect meat color. In addition, we identified genes (TIMP2, PKM2, PRKG1, MAP3K5, and ATP8A1) related to feeding efficiency. Among the enriched Gene Ontology Biological Process (GO BP) terms, actin cytoskeleton organization, actin filament-based process, and protein ubiquitination are associated with meat tenderness whereas cellular component organization, negative regulation of actin filament depolymerization and negative regulation of protein complex disassembly are involved in adipocyte regulation. The MAPK pathway is responsible for cell proliferation and plays an important role in hyperplastic growth, which has a positive effect on meat tenderness. CONCLUSION: Results revealed several candidate genes positively selected in Ankole cattle in relation to meat quality characteristics. The genes identified are involved in muscle structure and metabolism, and adipose metabolism and adipogenesis. These genes help in the understanding of the biological mechanisms controlling beef quality characteristics in African Ankole cattle. These results provide a basis for further research on the genomic characteristics of Ankole and other Sanga cattle breeds for quality beef. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12863-016-0467-1) contains supplementary material, which is available to authorized users

    Peeling Back the Evolutionary Layers of Molecular Mechanisms Responsive to Exercise-Stress in the Skeletal Muscle of the Racing Horse

    Get PDF
    The modern horse (Equus caballus) is the product of over 50 million yrs of evolution. The athletic abilities of the horse have been enhanced during the past 6000 yrs under domestication. Therefore, the horse serves as a valuable model to understand the physiology and molecular mechanisms of adaptive responses to exercise. The structure and function of skeletal muscle show remarkable plasticity to the physical and metabolic challenges following exercise. Here, we reveal an evolutionary layer of responsiveness to exercise-stress in the skeletal muscle of the racing horse. We analysed differentially expressed genes and their co-expression networks in a large-scale RNA-sequence dataset comparing expression before and after exercise. By estimating genome-wide dN/dS ratios using six mammalian genomes, and FST and iHS using re-sequencing data derived from 20 horses, we were able to peel back the evolutionary layers of adaptations to exercise-stress in the horse. We found that the oldest and thickest layer (dN/dS) consists of system-wide tissue and organ adaptations. We further find that, during the period of horse domestication, the older layer (FST) is mainly responsible for adaptations to inflammation and energy metabolism, and the most recent layer (iHS) for neurological system process, cell adhesion, and proteolysis.close3

    Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

    Get PDF
    A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved

    Detecting loci under recent positive selection in dairy and beef cattle by combining different genome-wide scan methods

    Get PDF
    As the methodologies available for the detection of positive selection from genomic data vary in terms of assumptions and execution, weak correlations are expected among them. However, if there is any given signal that is consistently supported across different methodologies, it is strong evidence that the locus has been under past selection. In this paper, a straightforward frequentist approach based on the Stouffer Method to combine P-values across different tests for evidence of recent positive selection in common variations, as well as strategies for extracting biological information from the detected signals, were described and applied to high density single nucleotide polymorphism (SNP) data generated from dairy and beef cattle (taurine and indicine). The ancestral Bovinae allele state of over 440,000 SNP is also reported. Using this combination of methods, highly significant (P<3.17×10(-7)) population-specific sweeps pointing out to candidate genes and pathways that may be involved in beef and dairy production were identified. The most significant signal was found in the Cornichon homolog 3 gene (CNIH3) in Brown Swiss (P = 3.82×10(-12)), and may be involved in the regulation of pre-ovulatory luteinizing hormone surge. Other putative pathways under selection are the glucolysis/gluconeogenesis, transcription machinery and chemokine/cytokine activity in Angus; calpain-calpastatin system and ribosome biogenesis in Brown Swiss; and gangliosides deposition in milk fat globules in Gyr. The composite method, combined with the strategies applied to retrieve functional information, may be a useful tool for surveying genome-wide selective sweeps and providing insights in to the source of selection
    corecore